StyleTalk: One-Shot Talking Head Generation with Controllable Speaking Styles

نویسندگان

چکیده

Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable motions, they still cannot generate styles the final videos. To tackle this problem, we propose a style-controllable face generation framework. In nutshell, aim to attain style from an arbitrary reference video then drive portrait another piece of audio. Specifically, first develop encoder extract dynamic motion patterns encode them into code. Afterward, introduce decoder synthesize stylized animations speech content order integrate generated videos, design style-aware adaptive transformer, which enables encoded code adjust weights feed-forward layers accordingly. Thanks adaptation mechanism, can be better embedded synthesized videos during decoding. Extensive experiments demonstrate that our method is capable generating only one image audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Talking to machines (statistically speaking)

Statistical methods have long been the dominant approach in speech recognition and probabilistic modelling in ASR is now a mature technology. The use of statistical methods in other areas of spoken dialogue is however more recent and rather less mature. This paper reviews spoken dialogue systems from a statistical modelling perspective. The complete system is first presented as a partially obse...

متن کامل

Still talking to machines (cognitively speaking)

This overview article reviews the structure of a fully statistical spoken dialogue system (SDS), using as illustration, various systems and components built at Cambridge over the last few years. Most of the components in an SDS are essentially classifiers which can be trained using supervised learning. However, the dialogue management component must track the state of the dialogue and optimise ...

متن کامل

Modelling 'Talking Head' Behaviour

We describe a generative model of ‘talking head’ facial behaviour, intended for use in both video synthesis and model-based interpretation. The model is learnt, without supervision, from talking head video, parameterised by tracking with an Active Appearance Model (AAM). We present a integrated probabilistic framework for capturing both the short-term visual dynamics and longer-term behavioural...

متن کامل

Animated talking head with personalized 3D head model

Natural Human-Computer Interface requires integration of realistic audio and visual information for perception and display. An example of such an interface is an animated talking head displayed on the computer screen in the form of a human-like computer agent. This system converts text to acoustic speech with synchronized animation of mouth movements. The talking head is based on a generic 3D h...

متن کامل

Speaker verification with elicited speaking styles in the VeriVox project

Some experiments to take care of within speaker variations in speaker verification has been performed. To get speaker variation, speaking behaviour elicitation software has been developed. It was found that if an ASV system was trained on varied speech, speaker verification on even more varied speech improved significantly.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i2.25280